A Sentence-Level Hierarchical BERT Model for Document Classification with Limited Labelled Data

نویسندگان

چکیده

The emergence of transformer models like BERT means that deep learning language can achieve reasonably good performance in document classification with few labelled instances. However, there is a lack evidence for the utility applying BERT-like on long few-shot scenarios. This paper introduces long-text-specific model—the Hierarchical Model (HBM)—that learns sentence-level features and works well Evaluation experiments demonstrate HBM can, only 50 to 200 instances, higher than existing state-of-the-art methods, especially when documents are long. Also, as an extra benefit HBM, salient sentences identified by useful explanations classifications. A user study demonstrates highlighting these effective way speed up annotation required interactive machine approaches active learning.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering with Propagation for Hierarchical Document Classification

We address the problem of unsupervised classification of documents into a given hierarchy of concepts with few unlabeled examples. In contrast to various previous approaches where only the leaves of the hierarchy represent valid classes, we consider the situation where documents must also be classified into internal nodes. We claim that the relationships between classes are part of the a priori...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

A Model for Project Selecting with Limited Resources in Data Envelopment Analysis with Input and Output Fuzzy

In Evaluating Performance, Selecting a Subset from a Set of Solutions with Limited Resources is Essential. If There Is More Than One Input and Output, the Data Rnvelopment Analysis Optimization Models Are Evaluated and Performance Measurement Based on the Weighted Output Is Divided Weighted Input. In This Research, Two Models of Optimization with Limited Resources Present from Data Envelopment ...

متن کامل

An Improved Hierarchical Bayesian Model of Language for Document Classification

This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model’s capacity to outperform both the simple multinomial and more recently proposed extensions on the cl...

متن کامل

Relational Data Model in Document Hierarchical Indexing

One of the problems of the development of document indexing and retrieval applications is the usage of hierarchies. In this paper we describe a method of automatic hierarchical indexing using the traditional relational data model. The main idea is to assign continuous numbers to the words (grammatical forms of the words) that characterize the nodes in the hierarchy (concept tree). One of the ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-88942-5_18